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Abstract 

We investigate systems of interacting stochastic differential equations with two kinds of hetero¬ 
geneity: one originating from different weights of the linkages, and one concerning their asymptotic 
relevance when the system becomes large. To capture these effects we define a partial mean field sys¬ 
tem, and prove a law of large numbers with explicit bounds on the mean squared error. Furthermore, 
a large deviation result is established under reasonable assumptions. The theory will be illustrated 
by several examples: on the one hand, we recover the classical results of chaos propagation for homo¬ 
geneous systems, and on the other hand, we demonstrate the validity of our assumptions for quite 
general heterogeneous networks including those arising from preferential attachment random graph 
models. 
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1 Introduction 


The application of mean field theory to large systems of stochastic differential equations (SDEs) was 
initiated by McKean’s seminal work 26], l27|, [2^. In the classical case, an A^-dimensional interacting 
particle system is governed by SDEs of the form 


1 


E (^f (^) - + ds.(t), t e R+, 

3¥=i 

Xf(0)=X,(0), * = l,...,iV, 


( 1 . 1 ) 


with independent starting random variables Xi(0) and independent Brownian motions Bi. As the number 
of particles increases, the pair dependencies in this coupled system decrease with order 1/A^ such that a 
law of large numbers applies (see Theorem 1.4 of [s^). Defining 


E (0] - x^(t)) + di?.(t), t e 


N -1 




Af(0) = A,(0), * = l,...,iV, 

there exists for every T G M+ a constant C{T) G R+ independent of JV such that 


sup E 


sup |Af(t)-Af(t)p 
ie[0.T] 


1/2 


< 


C{T) 
y/N ■ 


( 1 . 2 ) 


(1.3) 


In other words, in a large system, the behaviour of a fixed number of particles evolving according to dm 
is well described by the so-called mean field system ILII, where all stochastic processes are stochastically 
independent, a phenomenon that is called propagation of chaos. Thus, mean field theory provides a model 
simplification by reducing a many-body problem as in dm to a one-body problem as in (jl.2|) with explicit 
i^-estimates on the occurring error. Moreover, it can be shown that the empirical measure of the particles 
satisfies a large deviation principle as —>■ oo, see [ll|, [1^. There exists a huge literature dealing with 
this or related topics, and we only mention the review papers 0 Is^, where one can also find further 
references. 

The systems dm and dm describe statistically equal or exchangeable particles: any permutation of 
the indices i G {1,..., N} leads to a system with the same distribution (cf. [37|). In particle physics such 
an assumption is certainly reasonable and underlies many other similar models of mean field type, see 


for example the two treatises |35l . l36l| for numerous examples. 

However, when mean field models are considered in applications other than statistical mechanics, the 
homogeneity assumption may not be appropriate in all situations. For instance, in [3, 22| the processes 
dm are used to model the wealth of trading agents in an economy, who are typically far from being equal 
in their trading behaviour (there are “market makers” and others). Similarly, the stochastic Cucker-Smale 
model that is considered in [l|, Q describes the “flocking” phenomenon of individuals. Also here it only 
seems natural that one or several “leaders” may have a distinguished role, setting them apart from the 
remaining system. Moreover, in systemic risk modelling the particles represent financial institutions that 
interact with each other through mutual exposures, see [J, 123] for some approaches in this direction. 

The different players in the banking sector vary considerably in size and importance, which is obvious 
from the fact that some banks were considered too big to fail during the financial crisis of 2007-08. 
Further fields of applications where mean field theory is used for interacting particle systems include 
genetic algorithms [l^, neuron modelling (see [l^ and references therein) and epidemics modelling 241. 

Partly triggered by the examples in the previous paragraph, this research aims to investigate de¬ 
viations from homogeneous systems to heterogeneous systems. First, we allow for different interaction 
rates between pairs (instead of 1/(A^ — 1) throughout), and second, we permit the subsistence of a core¬ 
periphery structure in the mean field limit, that is, some particles may have a non-vanishing influence 
even when the system becomes large. Another restriction we will relax in our analysis concerns the driv¬ 
ing noises of the interacting SDEs: instead of independence we explicitly allow for different degrees of 
dependence in the noise terms, even asymptotically. Until now there exists only a very small amount of 
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literature that generalizes dm in these directions: in [^, [^, the particles are divided into finitely 

many groups within which they are homogeneous (and the number of members in both groups must tend 
to infinity for the law of large numbers), and [8|, [lOj, where one major agent exists and propagation of 
chaos for the minor agents is considered conditioned on the major one. Other papers that consider general 


sider general 

[IlBlIIl, 


heterogeneous systems include [l^, where the propagation of chaos result is assumed, and 
where a law of large numbers for the empirical measure is proved under various conditions. Regarding 
the last-mentioned papers, two aspects are worth commenting on. First, assuming that finitely many core 
particles do exist in the system, their contribution to the empirical distribution becomes less and less 
as iV —>■ oo although their impact may very well stay high. Thus, in this case the empirical distribution 
may fail to describe the behaviour of the system as a whole. Second, whereas for homogeneous systems 
the convergence of the empirical measure is equivalent to the existence of a mean field limit in the sense 
of (11.31) (see e.g. Proposition 2.2(i) of 3^), this is no longer true for heterogeneous systems. For core 
particles the left-hand side of (IE3 need not converge to 0 even if the empirical distribution converges, 
say, to a deterministic limit. For example, in the case of with one core particle, an unconditional 

propagation of chaos result does not hold for this particle without further assumptions (even if it does 
for the periphery particles). 

Due to the two aforementioned reasons, we will not work with the empirical distribution in this paper 
but state and prove mean field limit theorems for the particles on the process level. In Section[2]we start by 
introducing the precise interacting particle model we want to investigate. Then we define a corresponding 
partial mean field model, for which we prove a law of large numbers type result (Theorem 13.11) with 
explicit convergence rates in Section [3] It generalizes (11.31) by taking into account the different kinds 
of heterogeneity due to varying pair interaction rates, a distinction between important/core and less 
important/periphery pair relationships, and interdependencies between the driving noise terms. 

The main difficulty here is to identify the correct rates that govern the distance between the original 
system and the mean field approximation. As we will see, a total of twelve rates is required, each expressing 
a connectiviW property of the underlying interaction and correlation networks. This is inevitable in 
contrast to [8|, [lOj where the stochastic dependencies among the particles are annihilated simply by 
conditioning. In order to elucidate the meaning of each rate, we discuss three exemplary situations in 
detail. In Section f3.ll in particular in ExamDle l3.41 we show that in the quasi-homogeneous case all twelve 
rates typically boil down to a single rate like in (ll.3|l . In Section 13.21 we explain why the prerequisites 
for Theorem 13.11 in the heterogeneous case are essentially sparsity assumptions on the particle network, 
which are satisfied for instance if this network is generated from a preferential attachment mechanism, 
see Section 13.31 In order to show the last statement, we have to derive the asymptotics of the maximal 
in- and out-degrees of directed preferential attachment graphs, see Lemma 13.81 This result may be of 
independent interest and generalizes that of for undirected graphs. 

The second main result of our paper is a large deviation principle for the difference , which 

is presented in Section |4] as Theorem 14.II In contrast to homogeneous systems, where such a principle is 
proved for the empirical measure (see jllL l25|b we work on the process level again and therefore need to 
require the existence of all exponential moments. Furthermore, due to heterogeneity, we do not obtain an 
explicit formula for the large deviation rate function, but a variational representation as Fenchel-Legendre 
transform. The final Section [S] contains the proofs. 


2 The model 

Before we introduce the model we analyze in this paper, we list a number of notations that will be 
employed throughout the paper. 

]R_i_ the set [0, oo) of positive real numbers; 

[z] the largest integer smaller or equal to z £ R; 

N the natural numbers {1, 2,...}; 

A,x the typical notation for a matrix A = {Aij : i,j G N) G and a vector 

X = {xi'. i G N)' G R^, with all binary relations such as <, or operations 
relying on them such as the absolute value | • | or taking the supremum being 
understood componentwise when applied to matrices and vectors; 
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(•)' 

AB, Ax^ e^ 


x.y 

I ^1 OO ; 1^1 OO 

l^ld 

AX 

I 

LP 


E[X],Var[X] 
Cov[X, F], Cov[X] 


/^d /^oo 

K^rp ^ \^rp 

Ar^d A/^OO 


V^, 

U,.h 


the transposition operator; 

matrix-matrix and matrix-vector multiplication and the matrix exponential, 
all defined in analogy to the finite-dimensional case, provided that the involved 
series converge; 

the entrywise product x.y = {xiyi: i £ N)' for x,y G 

|A|oo := supjgf^X^jGN \^ij \ s-nd kU := sup^gi^ |a:i| for A G M”xn ^nd x G M”; 

|A|d := supjgpj \ Aii\ for matrices A; 

the matrix A with all diagonal entries set to 0; 

the identity matrix in R^xn qj. gome d G N; 

the space P), p G [l,oo], endowed with the topology induced by 

||X||iP := E[|X|^’]^/^’, and to be understood entrywise when applied to matrix- 
or vector-valued random variables; 

componentwise expectation and variance for random variables in or R^; 

the matrices whose (ij)-th entry is Cov[Xj,y^] and Cov[Xi,Xj], respectively, 
when X and Y are random vectors; 

x*(t) := supsg[Q |a;(s)| for t G R+ and functions x: R+ —>■ R, again considered 
entrywise when x takes values in or R^; 

the space of R'^-valued (resp. R^-valued) functions on [0, T] whose coordinates 
are all cadlag functions; 

elements of and where each coordinate is a continuous function; 
elements of and where each coordinate is an absolutely continuous 
function; 

the cr-field on (resp. generated by the evaluation maps ttAx) = x(t), 
xGDf^ (resp. D^), for t G [0,r]; 

the uniform topology and the Skorokhod topology on and (in the latter 
case they are defined via the product of the d-dimensional topologies); 
the space of all {9i,, Od) where each 6i is a signed Borel measure on [0, T] of 
finite total variation |0i|([O,r]) 


Given a stochastic basis (fl, F = (J^(t))tgR_^, P) satisfying the usual hypotheses of completeness and 
right-continuity, we investigate a network described by the following interacting particle system (IPS); 

oo oo oo 

dX,it) = Y,a,,{t)X,{t) dt + Y. aij(t)Xjit-) dL,{t) + Y /b(^) dBj{t) 
i=i i=i i=i 

OO 

+ ^Pzj(i)dMj(t), tGM.+ , ieN, (2.1) 

j=i 

subjected to some J'(0)-measurable R^-valued initial condition X(0). We will also use the more compact 
form 

dX{t) = a{t)X{t) d< -I- a{t)X{t—).dL{t) f{t) dB{t) + pit) dM(t), t G R+, (2.2) 

for (12.11) . The ingredients satisfy the following conditions: 

• The two measurable functions t >-->• a(t) and t i—>■ a{t) are decomposed into a = oF and 

fj = + cA’ such that for all T G R+ and i,j G N 

At^(T):= sup K,(t)|<oo, mT):= sup |a<>(t)|<oo, og{C,P}. (2.3) 

tG[0,T] tG[0,T] 

We define A.,(T) := A§(T) + (T) and E„(T) := Sg(r) + Sf/T). 

• L is an R^-valued F-Levy process (i.e. an F-adapted Levy process whose increments are independent 
of the past cr-fields in F) with finite second moment and mean 0. 
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• M is an M^-valued square-integrable martingale on any finite time interval, and B is an K^-valued 

predictable process such that each coordinate process is of locally hnite variation. We assume that 
B and the predictable quadratic variation process (M, M) have progressively measurable Lebesgue 
densities b: x M+ —>■ and c: fl x R+ —>• 

• / is the sum of two deterministic measurable functions /^, : K+ —?► and p the sum of two 

predictable processes p^,p^ : n x R+ —R^^^. 

Of course, the stochastic integrals behind (12.21) must make sense: each single integral must be well 
defined and the infinite sums must converge in an appropriate sense. A sufficient condition for the existence 
of the infinite-dimensional integral is the existence of the one-dimensional ones plus the summability of 
their A^-norms. 

Next, we shall explain the rationale behind the IPS model (12.21) and the specific choices for the involved 
processes. By the definition given in (12.IL the processes {Xi : i G N)' are coupled in two ways in general: 
first, they interact internally with each other through a drift term (determined by a) and a volatility 
term (determined by cr in conjunction with L); and second, they are exposed to the same external forces 
(given by B and M), where / and p determine the level of influence these noises have on the particles. 
In particular, by tuning the parameters a, cr, / and p appropriately, one obtains a large range of possible 
dependence structures for the model (|2.2I) . 

The question this paper aims to attack is how and to which degree the complexity of the high¬ 
dimensional IPS (12.21) can be reduced. Of course, if all entries of the matrices a, cr, / and p are zero or 
large, there is no hope in simplifying the model. Therefore, our focus lies on particle networks, where only 
a small number of pairs have strong interaction, while the majority of links in the system are relatively 
weak. This is implemented in the decomposition of a, cr, / and p into a core matrix (superscript C) and 
a periphery matrix part (superscript P). It is important to notice that our distinction between core and 
periphery is not made on the basis of the particles, but on the linkages between them. This allows for 
greater modelling flexibility since it includes multi-tier networks in our analysis. 

In the presence of non-negligible pair interactions it is natural to apply the mean field limit only to the 
links encoded by the periphery matrices. Therefore, we propose the following partial mean field system 
(PMFS) as an approximation to the IPS (12.21) : 

dX{t) = (a®(t)A(t) -f aP(t)E[W(t)]) dt + (a^{t)X{t-) -K (TP(t)E[W(t)]) .dL{t) 

+ f^{t)b{t) dt -I- /^(t)E[6(t)] dt -I- p^(t) dM(t), t G R+, 

X(0)=X(0). (2.4) 

Written for each row t G N, this is equivalent to: 

OO OO 

dXft) = ^ (ag-(t)A,(t) + aP (t)E[W,(t)]) dt + ^ (af^^{t)Xft-) + 4(t)E[A,(t)]) dLft) 

i=i i=i 

OO OO 

+ H W + /b(^)]) + P% W ^^0 W’ t G R+, 

i=i 

X,(0)=W(0). (2.5) 

It is clear that a priori there is no reason for (12.41) to be a good approximation for (12.21) . Therefore, 
in the next section, we will give precise L^-estimates in terms of the model coefficients for the difference 
between the IPS and the PMFS. Moreover, we will determine conditions under which this difference 
becomes small such that we can indeed speak of a law of large numbers. 


Partial mean Reid limits in heterogeneous networks 


6 


3 Law of large numbers 

The first main result of this paper assesses the distance between the original IPS (12.21) and the PMFS 
(1^ . To formulate this we have to introduce some further notation. For T € R+ we define 


(T) :=sup^A„(T), 

VL ■■= sup ||Li(l)||i, 2 , 
igN 


Wa,d('P) := supA^(T), 

igN 

Ub(r) := sup sup \\hi 
igN tg[0.T] 


1L2, 


,y{T) ■= sup^Eij(T), 


*gN 

vx ■■= sup||X,( 0 )||l 2 , 
igN 


z;/(T):=sup sup Wl + 

iGN tG[0,T] 

( oo 

j,k=i 


1/2 


(3.1) 


and introduce the rates 


ri(r) := 24 P(T)|Cov[X(0)]|(24P(T))' 
r,{T) := AP(T)|Cov[L(l)]|(24P(r))' 


1/2 

d ’ 
1/2 


r- 5 (r) := sup f {t)Cov[b{t)]{f (t))' 
tG[0,T] 

rriT) ■.= \A^{T)A^{Tr\. 


1/2 


r2{T) := EP(T)|Cov[X(0)]|(EP(r))' 
r4(r) := |eP(T)|Cov[T(1)]|(EP(T))' 
r6(r) := sup E [pP(t)c(t)(/(f))'] 

tg[0.T] 

rs{T) :=\j:^{T)A^{Tr\. 


1/2 

d ’ 
1/2 

d ’ 
1/2 


T,iT):= sup AP(r)|/C(s)Cov[6(s),6(t)](/C(/))'|(AP(T))' 

s,tg[0,T] 

rio(r):= sup EP(r)|/C(s)Cov[6(s),6(t)]/C(/)y|(EP(r))' 

s,tg[0,T] 

1/2 

rn(r) := sup A^{T)\E[p^{t)c{t){p^{t)y]\{A^ {T))' 

tg[0.T] 


1/2 
d ’ 
1/2 
d ’ 


ri 2 (r) := sup EP(T)|E[pC(/)c(/)(pC(i))']|(£P(r))' 

tg[0.T] 


d 

1/2 

d 


(3.2) 


Theorem 3.1. Fix some T G R+ and grant the general model assumptions as given in Section 
Furthermore, assume that each of the numbers in (EH is finite. Then (1^ and (1^ have a pathwise 
unique solution X and X, respectively, and there exist constants K{T) and KfiT), t = 1,..., 12, which 
depend on the model coefficients only through the numbers in EH. such that 


sup\\{X,- Xy*iT)\\^, < KiT)J2KfiT)rfiT). 


igN 


(3.3) 


L = 1 


The proof of Theorem 13.11 will be given in Section [5] Compared to the homogeneous case of 3^ , 
we have to take care of several kinds of heterogeneous dependencies in the system: different weights on 
the edges, the distinction between core and periphery links, and possibly dependent driving noises. This 
explains why we have twelve rates in contrast to a single one in (II.3F 


Remark 3.2 Our calculations furnish the following constants in (13.3|) : 

K{T) := V2exffi{ffi/^VaiT) + 2vffiT)vL)^T), 


(3.4) 
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and 


where 


Ki{T) 

:= E{T)T, 

K2{T) 

KsiT) 

:= ‘^E{T)v^{T)V{T)T^/^, 

Ka{T) 

KsiT) 

:= T, 

MT) 

Kr{T) 

:= ]^E{T)V{T)T\ 

Ks{T) 

K9{T) 

:= \e{T)T\ 

Kio{T) 

KiiiT) 

:= ^E{T)T^/^, 

0 

Ki2(T) 


2 

7 !' 


E(T) := V(T) := ^e(ya(T)E^^+ 2 vj,v^(T)fT vf{T)vb{T)T + 2vp^MiT)T^/'^'^ . 


□ 


Remark 3.3 There are several possibilities to extend Theorem l3 .1 [ without substantially new arguments. 

(1) It is straightforward to show that Theorem 13.11 can be extended to the case where the interaction 
matrices a and a are replaced by (still deterministic but possibly history-dependent) linear function¬ 
als. 

(2) Suppose that L = TL^ with some matrix T € ]^NxN some other Levy process with finite 

variance and mean zero. Furthermore, F = F^ -|- F^ and accordingly and = F^L°. 

What one would like to do when passing to the PMFS (12.41) is to replace L there by L^. How does 
this affect the estimate (13.31) in Theorem 13.11 ? A similar analysis as for Theorem 13.11 reveals that an 
extra rate 

ri3 := rPCov[L(l)](rP)' 

d 

appears with constant K 13 := 2 va{T)V. 

(3) Two further generalizations are discussed in Remark 13.51 and Remark 13.71 below. □ 

It is obvious that the usefulness of Theorem 13.II depends on the sizes of the rates in (IX^ : only if they 
are small, the PMFS (12.41) is a good approximation to the IPS (12.21) . Moreover, there are two different 
views on Theorem l3.ll first, if we assume that the underlying network of the IPS is static, it gives an upper 
bound on the L^-error when the IPS is approximated by the PMFS; and second, if the interaction network 
(i.e. a, CT, / and p) is assumed to evolve according to an index N € N, Theorem [3T] gives conditions under 
which the PMFS converges in the T^-sense to the IPS when N ^ oo (this happens precisely when all 
rates in (13.21) converge to 0 as TV —>■ 00 , and the numbers in (|3.1D are majorized independently of N). It 
is also this second point of view that is the traditional one in mean field analysis and that justifies the 
title “Law of large numbers” for the current section. 

In the following subsections we will study three examples of dynamical networks and the corresponding 
conditions for the law of large numbers to hold for the PMFS. 


3.1 Propagation of chaos 

We first discuss the phenomenon of chaos propagation, and our results will particularly extend the results 
of [ 1 ^, Section 17.3, [ 2 ^, Corollary 4.1, and 34|, Theorem 1.4 by including inhomogeneous weights in 
the model. The setting is as follows: 

(1) The underlying network changes with G N. In particular, we will index X and X, the coefficients 
a, a, f and p as well as the rates in (13.2|) by N. 
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(2) All structural assumptions in Section [2] hold and the numbers in (13.11) . some of which now depend on 
JV, are uniformly bounded in JV. 

(3) The core matrices a^'^{t), and p^’^{t) are diagonal matrices for all times t € R+. 

(4) For each iV G N, {Li, bi, Mi, Al/^(0): i G N) is a sequence of independent random elements (note 
that the noises indexed by a fixed i may depend on each other). 

(5) For each T G R.+ the following rates converge to 0 as —?► oo: 



/oo \ 

1/2 

rf(T) :=sup| 


■■= sup sup 


) 

' ieNtgfO.T] \ 


/oo \ 

1/2 

r^{T) := sup I 


1 : rf (T) := sup sup 



' ieNtgfO.T] \ 


1/2 


1/2 




These hypotheses ensure that all pair dependencies between the processes Xf, / G N, vanish when 
N —>■ oo. As a result, in the PMFS, the independence of the particles i at t = 0 propagates through all 
times t > 0: the PMFS decouples in contrast to the original IPS. 


Example 3.4 In classical mean field theory as in the references mentioned in the introduction, the 
iV-th network consists of exactly N particles. In other words, , fl^, pfj and A/^(0) are all 0 for 

i > N or j > N. Moreover, all pair interaction is assumed to be of order 1/iV, that is, we have for each 


T G M+ 




Aj{T) 

N ’ 


niV,P 


(T) 


N 


i,j G N, 


(3.5) 


where Aij{T),T,ij{T) G K+ are uniformly bounded in i,j G N. Furthermore, the driving noises are 
supposed to be independent for different particles and to enter the PMFS completely. This means that 
(3) and (4) hold and that = p^’^ = 0. It is easily shown that under these specifications the rates in 
(5) above converge to 0 as oo: and rj (T) are simply 0, and {T) and (T) are of order 

1 /VN as N ^ oo. □ 


We still need to show that under assumptions (l)-(5) above, all rates {T), u = 1,..., 12, converge 
to 0 as A^ —>■ oo. Since A^’‘^{T) is diagonal, we have A^’^{T)^ = 0, and since the driving noises for 
different particles are independent, all covariances (or covariations) vanish outside the diagonal. Thus, 
we have 


f(T)<Uxrf(r), 

-^{T)<VLr^{T), 

f (T) = 0, 

foiT)<vt{T)vf{T)r^{T) 


f (T) < vxr^iT), 
f{T) < vt,{T)rf{T), 
f (T) = 0, 

fi(T)<Up,M(T)rf(T) 


rnT)<VLr^{T), 

(T) = 

r^{T) < VbiT)vfiT)r^{T), 
rf2(T)<Up.M(T)r^(T), 


which all converge to 0 as A^ —>■ oo by hypothesis. The following remark continues Remark 13.31 reEfarding 
further generalizations of Theorem 13.11 


Remark 3.5 In the setting of this subsection there are actually no core relationships between different 
particles: every pair interaction rate tends to 0 with large N. If we even assume that there is no dependence 
at all originating from the noises (i.e. = p^’^ = 0 above), the propagation of chaos result can easily 

be extended to nonlinear Lipschitz interaction terms (suitably bounded in A^) instead of the matrices 
and . As a matter of fact, the classical method of [s^, Theorem 1.4, can be applied with obvious 
changes. □ 
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3.2 Sparse interaction versus sparse correlation 

The propagation of chaos result in the last subsection was based on two core hypotheses: asymptotically 
vanishing pair interaction rates and the independence of the particles’ driving noises. The motivation for 
establishing Theorem 13.11 however, is to deal with situations where these two conditions are precisely not 
satisfied, that is, when the coefficients a, cr, / and p of (|2.2D are decomposed into a core and a periphery 
part in a non-trivial way. In fact, in this subsection we discuss a typical situation where the full generality 
of Theorem 13.II is required. Before that, we recall that we consider networks indexed hy N G N, and that 
we are interested in the cases when the rates in (|3.2I) vanish when N becomes large. 


General assumptions 


The following list of hypotheses describes the setting in this subsection. 

(1) The statements (1) and (2) of Section [XT] hold. 

(2) M is an R’^-valued F-Levy process, implying that Cij{t) = Cov[Mi{l), 

(3) At stage N, the system consists of A^o + particles with some fixed Nq G N, that is, we have 

= fij = P^j = = 0 as soon as i > A^o + or j > TVq + A^. 

(4) C := {1, ..., Nq} contains the core particles, := {Nq + 1,..., N} the periphery particles, whose 

number increases with N. Correspondingly, and (resp. and a^’^) characterize the 

influence of the core (resp. periphery) particles in the system. In other words, j G C implies that 
a^r^(t) = rrfr^{t) = 0 for all z € N and t G M+, while j G implies a^j’^it) = cr^’^{t) = 0 for all 

i j and t G M+. We assume that the diagonals of and are completely contained in and 

<7^’^, respectively. It follows that the partitions of and can be illustrated as (omitting all zero 
rows and columns, and using * for all potentially non-zero elements): 


a 


N,C 


No 


N 


No 


N 


/ * .. 

* 

0 ••• 

••• 0\ 


/o .. 

• 0 


... * ^ 

* 

* 

0 ••• 

... 0 

’ - 

0 • 

• 0 



* 

* 

* 0 

... 0 

0 •• 

• 0 

0 * 




0 ■■■ 

0 



* 

* 


\ * •• 

* 

0 ••• 

0 * J 


U •• 

• 0 


* 0 / 


(5) There is a finite number of systematic noises, namely Bi,... B^oo Mi,..., for some fixed 

A’oo S N independent of N, that are important to a large part of the system, and there are id¬ 
iosyncratic noises B^gg+i and M^gg+i that only affect the specific particle i G {1,..., A^}. Thus, we 


assume for alH = 1,..., iV and t G M+ that pN (t) = ' (t) = 0 for j G {1,..., A^oo} U {A^oo + *} 

and pfj'^(t) = f^'^{t) = 0 for the other values of j. Hence, and p^ are of the form 


fN,C/ 

pN,C 


Nqq 


Noo-\-N 


-^00 


Noo-\-N 


(* " 

* 

* 0 

... 0 \ 


/ 0 . 

. 0 

0 * 

... * ^ 



0 

0 

fN,P/ 



* 





0 





* 

{ * .. 

* 

0 ... 

0 * J 


V 0 • 

. 0 


* 0 J 


(6) We have for all T G K+ 




■ 

Ua 


^N,Pfrp^ _ 


i,j = 1,...,N, 


(3.6) 
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where the rates R^,R^ G M+ satisfy 

jdN tdN 

-it4 iXy 

—= — >• (X), —= ^ cx), as —>■ (X), 

Vn Vn 

and the numbers (j)fj{T),tpfj{T) G IR+ satisfy 

(/)(T) := sup < oo, ip(T) := sup i/’.y (T) < oo. 

iJ,NeN iJ,JVeN 

Note that we always have (j)f( (T) = ijj^(T) = 0. 


(3.7) 


(7) For different i,j G N, the noises Mi and Mj as well as Bi and Bj are uncorrelated. 

(8) The rates (T) and from Section [XT] converge to 0 as iV —oo for all T G M+. 

(9) For each JV G N, the initial values (X/^(0): i G V^) are mutually uncorrelated. 

Conditions (4) and (5) determine the core-periphery structure of the IPS. In practice, a fixed distinc¬ 
tion between core and periphery particles is often not possible because a large number of particles may 
be engaged in some strong and some weak linkages at the same time. As already pointed out, this does 
not affect the applicability of Theorem 13.11 since the concept of core and periphery refers to the linkages 
there. The choice of fixed core and periphery particles in this subsection is only a special case thereof, in¬ 
tended to simplify the arguments below. Next, regarding (6), one can take R^,R^ = N for concreteness, 
which can then be compared with Section l3.ll Furthermore, let us point out that assumption (7) is only 
for convenience (namely that and carry the whole correlation structure of the noises). Indeed, it 
is always possible (under our second-moment conditions) to replace any stochastic integral p ■ M, where 
M is a Levy process, with an arbitrary correlation structure by p' ■ M' where M' consists of mutually 
uncorrelated Levy processes (of course, (8) would change accordingly). Finally, if X^{0) is independent of 
the driving noises, (9) can be enforced simply by switching to the conditional distribution given A''^(0). 

Under (l)-(9) it is easy to prove that the rates r^{T), {T), (T) and r^ (T) converge to 0 when 

N ^ oo. For the latter two, this can be deduced in the same way as in Section [XT] because the driving 
noises of different particles are uncorrelated. For the other two, we use that the starting random variables 
of periphery particles are assumed to be uncorrelated. Hence, we have by (Irfl) . as iV — >■ oo, that 


1/2 


Vn 


rf (T) = sup I iAf/{T)fXar[X^m | < ^ 0 


ieN 




1/2 


y/N 


r^{T) = sup ^ (E,^'P(r))2Var[Xf (0)] < ^ 0. 

\j^pN I 


However, the nine conditions above are in general not sufficient to imply the smallness of the other 
rates in (IX^ . We need to add extra hypotheses. 


Sparseness assumptions 

For each of the remaining rates, we further examine what type of conditions are needed to make them 
asymptotically small. As we shall see, it is always a mixture of a sparseness condition on the interaction 
matrices and and a sparseness condition on the correlation matrices and p^. 

(T) and (T): We first present a counterexample to show that we have to require further conditions. 

Consider the simple case where Li = Li for all / G N and that Af^'^{T) = i/R^ for all T G M+ and 
hJ S {!,..., No + N} with i ^ j. Then 

/ 2 \ 
r^(T')=sup( Cov[Lj(l),Lfc(l)] I =vl^, 
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which need not to converge to 0 in general. A similar calculation can be done for r^(T). In order to make 
the rates (T) and (T) small, there are basically two options: we require the interaction matrices 
and to be sparse, or we require the correlation matrix of L to be sparse. Any other possibility 
is a suitable combination of these two. 

(10a) The noises {Li: i G V^) corresponding to periphery particles only have sparse correlation (which, in 
particular, includes the case of mutual independence as in Section [3. II) . More precisely, we require 

Pl ■■= mhj) ■. Coy[L,{ 1),L,{1)]^Q}\= o{{R^f L{R^f) (3.8) 


for large N. Then 


rf {T) = sup 

iGN 



1/2 


A^.^{T)A^i:^{T)GoY\L,{l),L,{l)] 


< 4>{T)vl 



and, similarly, r^(T) —>• 0 as —>• oo. 

(10b) The matrices A^’^{T) and E^’^(T), which describe the influence of periphery particles on the system, 
are only sparsely occupied, in the sense that every particle in the system is only affected by a small 
number of periphery particles. In mathematical terms this condition reads as 


phiT) ■■= 

P^{T) := 


sup#{j G A^’P(T) ^ 0} = o«). 


sup#{jGP^:E^’'’(T)^0} = o(A^). 


(3.9) 


In this case, we get 


r3^(r)=sup 

iGN 


1/2 

^ A^/{T)Af/{T)CoY[L,{l),Lk{l)]] < 

\j,kev^ 


r,N 

Pa,i 

jdN 

Ua 


and similarly r|^(T) 0 as iV oo. 


(T) and (T): These two rates express the connectivity between core and periphery particles. In 
general, they will be not become small with large N. For instance, if Afj’^{T) = 1 for all j G C and i ^ j, 
and = 1/i?;^ for all j G and f yf j, then 




(T) = sup ^ = iVo 


iGN 


jg-pN fegC 


N 

~qN- 

Ra 


does not necessarily converge to 0. An analogous statement holds for {T). For {T),rg (T) —>■ 0 
we have to require that the lower left block of A'^’^, which describes the influence of core particles on 
periphery particles, or the matrices A'^’^(r) and S^’^(T), which describe the influence of periphery 
particles on the system, be sparse (or a combination thereof): 


(lla) The influence of core on periphery particles is sparse. In other words, we suppose for the maximal 
number of periphery particles a single core particle interacts with through the drift: 

:= sup #{* G : A^>''(r) ^ 0} = o(A^ A i?^). 
jec 


Then, 


r.N 


r(y(r)=sup 

iGN 


EE 

jg-pN fegC 


A^/{T)AJ^^{T) < NoHT)va{T)?^ 


as well as r^{T) —>• 0 as A^ —>■ oo. 


(3.10) 
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(lib) A-^’^(T) and are sparse in the sense of (13.91) . Then (T), r|^(T) —>■ 0 follow similarly. 


(T), rf{)(T), rf[(T) and r^(T): Similar considerations as before show that these four rates do not 
converge to 0 in general. Instead, we again need to require some mixture of sparsely correlated driving 
noises and sparsely occupied matrices and 

(12a) We assume that for all T € M+ 

pf(r):= sup on [0,T]} = o(E^ A Ei), (3.11) 

je{i,...,Noo} 

P^(T):= sup #{i€r^:p^'^^0on[0,T]} = o(E^AE^). (3.12) 

iG{l,...,Afoo} 


Then, recalling that the components of b and M are mutually uncorrelated. 


Noo 


rgiT) =snp sup ^ ^4^’^(T)^^,’P(T) /j)'’^(s)/^’^(t)Cov[6i(s),6/(t)] 

.GNs.telO.T] 


1/2 


+ /So+i)(s)/So+*)(^)Cov[6A,„„+i(s),6Ar„„+i(t)] 

j(Z-pN 

(T) + y/N 

<mv,{T)vf{T y -^ 0 , 


EN 

n-A 


Noo 


= sup sup 

*eNtG[0.T] 

\ 1/2 

+ ^ (^■P(T))2|E[(p;j^^^^^)(t))2]|Var[M^„„+,(l)] 

,^,v^P^(r) + Viv _ 

< (j){T)vp^M{T) --^ 0 , 

Ua 


and similarly I’lo(T), r(^(T) —>• 0 as —>• oo. 

(12b) and E^’^ are sparse in the sense of (13.91) . Then one can deduce {T) —>• 0 for t = 9,10,11,12 
as before. 


We conclude this subsection with two remarks. 


Remark 3.6 In the sparseness conditions (I3.8D - (I3.12|) it is not essential that the majority of entries 
is exactly zero. As one can see from the definition of the rates (13.21) . they depend continuously on 
the underlying matrix entries. It suffices therefore that the matrix entries are small enough in a large 
proportion. □ 

Remark 3.7 What can be said about Theorem l3.1l in the general case of nonlinear Lipschitz coefficients 
and , apart from the special case discussed in Remark |3.5I ? In fact, a law of large numbers in the 
fashion of Theorem l3.1l can still be shown, but under more stringent conditions: namely we have to require 
condition (10b) above in addition, with and E^’^ now containing the Lipschitz constants of the 

interaction terms. The reason is that (10b) suffices to make (i G {3,4, 7,..., 12}) small. The remaining 
four rates are unrelated to and and therefore not affected by their nonlinear structure. It is 
important to notice that conditions like (10a) and (12a) are no longer sufficient to make the corresponding 
rates small. The reason is that they are conditions of correlation type. Since correlation is a linear measure 
of dependence, it is not surprising that these conditions are not suitable for the nonlinear case. We do 
not go into the details at this point. □ 
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3.3 Networks arising from preferential attachment 

As demonstrated in the last subsection, the crucial criterion for the rates (13.21) in Theorem 13.II to vanish 
asymptotically with growing network size can be described as a combination of sparse interaction and 
sparse correlation among the particles. Condition (13.91) plays a distinguished role here: when valid, it 
implies that eight out of twelve rates in (13.21) are small. Moreover, it is the key factor for a nonlinear 
generalization of Theorem 13.11 to hold or not; see Remark 13.71 The aim of this subsection is therefore to 
find algorithms for the generation of the underlying networks such that the resulting interaction matrices 
satisfy (13.91) . We will assume that a^'^{t) = and <T^’^(t) = are independent of t € K+, such 
that also {t) and (t) as well as ^ (t) and (t) (see (13.91) for their definitions) are independent 
of t. Furthermore, we only concentrate on p^ ^ as the analysis for p^ is completely analogous. 

We will base the creation of the IPS network on dynamical random graph mechanisms. Since we are 
mainly interested in heterogeneous graphs, we will investigate the preferential attachment or seale-free 
random graph Q. There are many similar but different constructions of preferential attachment graphs; 
in the following, we rely on the construction of @ for directed graphs. We remark that the random graphs 
to be constructed will be indexed by N, corresponding to a family of growing networks for the IPS. In 
particular, “time” in the random graph process must not be confused with the time t in the IPS (12.21) : 
the correct view is rather that the IPS network has been built from the random graphs before time t = 0, 
and, of course, independently of all random variables in (12.21) . 

The preferential attachment algorithm works as follows: we start with G(0) = (V, E{0)), a given graph 
consisting of vertices P = N and edges E{0) = {ei,..., e^}, where i/ G N and stands for a directed edge 
between two vertices. We allow for multiple edges and loops in our graphs. Without loss of generality, 
we assume that the set of vertices in G(0) with at least one neighbour given by {I,..., n(0)} with some 
n(0) G N. Furthermore, we fix a,/ 3,7 G M+ with a + /3 + 7 = 1 and a + 7 > 0 and two numbers 
( 5 m,d°ut g For iV G N we construct G{N) = (y,E{N)) from G{N — 1) according to the following 
algorithm. 

• With probability a, we create a new edge Ciz+at from v = n{N — 1) + 1 to a node w that is 
already connected in G{N — 1). Here w is chosen randomly from {!,..., n{N — 1)} according to the 
probability mass function 


dG(jV-i)M 

v + N - l + 5™n{N - 1)’ 


w G {I,...,n(iV- 1)}, 


where dg(u) denotes the in-degree of vertex v in a graph G. We define n{N) := n{N — 1) -|- 1 and 
E{N) -EiN -l)A{e,+N}- 

• With probability /3, a new edge is formed from some vertex v G {I, ■. ■ ,n{N — I)} to some 
w G {I,..., n(iV — I)} (the case v = w is possible). Here v and w are chosen independently according 
to the probability mass functions 


V + N - 1 + 6°'^*'n{N - 1)’ V + N - 1 + S'’^n{N - 1)' )}, 

respectively, where d™*(u) denotes the out-degree of vertex u in a graph G. Moreover, we set 
n{N) := n{N — I) and E(N) := E(N — I) U {e^+Ar}. 

• With probability 7 , a new edge e^+N from some v G {!,.■. ,n{N — I)} to w = n{N — I) -|- I is 
formed. Here v is chosen randomly according to the probability mass function 


dS(V-i)(^)+'5°"‘ 

n + N -1 + 6 °'^*^n{N - I) ’ 


V G I)}. 


We set n{N) := n{N — I) -|- I and E{N) := E{N — I) U {c^+n}. 

Evidently, we always have \E{N)\ = n + N while the number n{N) of non-isolated vertices in G{N) is 
random in general. 
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The most important result for our purposes is the following one. We define 

M“(iV) := max{dg(^)(z) := max{dg'(V)(i): i e N}, Ne No, 

as the maximal in-degree and out-degree in G{N), respectively. 

Lemma 3.8. The maximum in-degree M“(7V) and out-degree of G{N) satisfy the following 

asymptotics: 

c”(iV)M”(lV) ^^ Noo. (3.13) 

Here the convergence to the random variables /i™ and respectively, holds in the almost sure as well as 
in the -sense for all p £ [l,oo), and (c™(A'^))ArgN (c™*(A'^))jveN sequences of random variables 

which can be chosen such that for every e € ( 0 , a -I- 7 ) we have a.s. 

c“(7V)~^ = O = o ^ , N ^ oo. (3.14) 

It follows from this lemma that for every e £ (0, a -I- 7 ) we have a.s. 

M“(7V) = O , M°“*(iV) = O , N ^ 00 . 

In particular, if G{N) is used to model the underlying network of (i.e. an edge in G{N) from i to j 
is equivalent to afj ^ 0 ), we have 

( /3 + 7 \ 

^ l + jout j ^ iV -)■ 00. 

In other words, the first part of condition (13.91) holds as soon as R^, as specified through (13.611 and (13.711 . 
increases in N at least with rate 

_ 0 + -Y _ 

IV i+'5°"H-+t-o (3.15) 

for some small e. For example, in the classical case of Example 13.41 where = N, this is always true 
except in the case a = = 0 , where all edges start from one of the initial nodes with probability one. 

We conclude that in all non-trivial situations of the preferential attachment model, the resulting networks 
are sparse enough for the law of large numbers implied by Theorem 13.11 to be in force. 


4 Large deviations 

In Theorem l3.1l we have established bounds on the mean squared difference between the IPS (12.21) and the 
PMFS (12.41) . In Sections 13.1113.31 we have given examples of dynamical networks in which these bounds 
converge to 0 as the network size increases. A natural question is now whether a large deviation principle 
holds as iV —>■ 00 , which would then assure that the probability of deviating strongly from 
decreases exponentially fast in N. In the classical case of homogeneous networks, [ll| is the first paper 
to prove a large deviation principle for the empirical measures of the processes mi- For heterogeneous 
networks, however, the empirical measure might no longer be a good quantity to investigate: the weight 
of a particle now depends on which particle’s perspective is chosen. A sequence of differently weighted 
empirical measures seems to be more appropriate, but then their analysis becomes considerably more 
involved. Therefore, in this paper we take a more direct approach and study the large deviation behaviour 
of the difference X^ — X^ itself. In order to do so, we have to put stronger assumptions on the coefficients 
than in the previous sections. These are as follows. 

(Al) A^(0) is deterministic for each V £ N. 

(A2) For all V £ N we have < 7 ^ = 0. All other coefficients ^nd are constant in time. 

p^'^ and p^’^ (resp. and a^’^) only have '){N) (resp. F(iV)) non-zero columns, where '){N) 
forms a sequence of natural numbers increasing to infinity and r(V) grows at most like exp( 7 (V)). 
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(A3) All numbers in (13.11) . which are indexed by JV now, are bounded independently of JV. 

(A4) (Mi : i G N) is a sequence of independent mean-zero Levy processes whose Brownian motion part has 
variance Ci and whose Levy measure is i^i. Moreover, there exists a real-valued mean-zero Levy process 
Mg that dominates Mi, that is, its characteristics cg and vg satisfy Ci < cg and Vi(A) < ^o(A) for all 
i G N and Borel sets A C R, and that has finite exponential moments of all orders: < oo 

for all u G IR+. 

(A5) Assume that G^(t,s) := s,t G [0,T], converges uniformly to a limit 

G(t,s) G 

sup sup s)| < oo, sup sup \G^j(t, s) — Gij{t, s)\ ^ 0, N ^ oo. 

i.jeN s,tg[0,T] iJeN s,te[0,T] 

(A6) With R^(t) := t ^ there exists R(t) G such that 

sup sup \Rij(t)\ < oo, sup sup \Rij(t) — Rij{t)\ —>-0, N ^ oo. 
ij'eNte[0.T] i.jeNtG[0,T] 

(A7) The following two quantities are finite: 


<71 := lim sup <71 (iV) 

N—¥oo 


q2 := limsupq2(A^) 

N—^OO 


:= limsupsup 7 (A^) EE 


N—><X) V 


N,V N,C\ 

% I 


i=i k^i 


:= limsup sup 7(^) E 
Ar-).oo i,feeN 


1=1 


(A8) Define for m G N U {0} 

^m(u) := \cmU^ + / (e"^ - \- uz) Vmidz), u G R+. 

We assume that the following holds for every d G N: denoting for m G N, r G [0, T] and 0 G M^ 

/ T pT ^ pT ^ 

/ '^Gim{t-s,s-r)0i{dt)ds+ ^ i?™(t - r) 0^(dt), 

i=i 

the sequence ( '^m{Hm(0i r)) dr) is Cesaro summable, i.e. the following limit exists: 

V / 

, 7(1V) X 

EX (“) 

' ^ ' m—1 ^ 


Theorem 4.1. Let T G R+. Under (A1)-(A8), the sequence (X^ — satisfies a large deviation 

principle in (D^,Ji) with a good rate function I: —>■ [0,oo], that is, for every a G M+ the set 

{x G : I(x) < a} is compact in Di^ (with respect to the Ji-topology), and for every M G we have 


- inf I(x) < lim inf —E log P[A^ - G Ml < lim sup —E log P[A^ - G Ml < 

a^eintM ^ ^ - N^oo 7(A) ® ^ N^OO l(N) 


inf / (x), 

xGclM 


where intM and clM denote the interior and the closure of M in (D“, Ji), respectively. Moreover, the 
rate function I is convex, attains its minimum 0 uniquely at the origin and is infinite for x ^ AC^. 


Remark 4.2 (1) We cannot drop the requirement = 0 or condition (A4) in Theorem l4.ll If violated, 

the processes and X^ will typically not have exponential moments of all order, whose existence 
is essential for o ur p roof below. This kind of problem does not arise when empirical measures are 
11 . 25j for the homogeneous 


considered as in 


case. 
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(2) The Cesaro summability condition (A8) accounts for the possible inhomogeneity of the coefficients 
and the distribution of the noises. It holds in particular for the homogeneous case. Since a convergent 
series is Cesaro summable with the same limit, it also holds when we have asymptotic homogeneity 
(in the sense that the sequence inside the sum of m converges with m —> oo). 


(3) With 'y{N) = N and assumptions (A2)“(A4) in force, McKean’s example (11.11) or the model con¬ 
sidered in [ 2 ^ both satisfy the assumptions of the theorem. In McKean’s case our large deviation 
principle follows from that of ll| for the empirical measure by applying the contraction principle. 


5 Proofs 

We start with some preparatory results that are needed for the proof of Theorem 13.11 
Lemma 5.1. Under the assumptions of Theorem \3.1\ we have 

sup||x*(T)||^, <I/(r), 

iGN 


where V{T) is given in Theorem \S.l\ 


Proof. It is a consequence of (12.41) and the Burkholder-Davis-Gundy inequality that for all t G [0,r] 
and i £ N 



Therefore, if we define w{t) := supjgj^ ||(W)*(t)||L 2 , we obtain 


(t) <vx+ Vf{T)vb{T)T + 2vp^m{T)T^^'^ -I- Va{T) J w{s) ds -I- 2vLVa{T) (^j (w(s))^ ds 
< + Vf{T)vb{T)T + 2np,M(T)ri/2 + {va{T)T^^^ + 2vLVa{T)) ^ (n;(s))2 ds 


1/2 


Now we square the last inequality, apply the basic estimate (a -|- b)^ < 2{a^ -|- 6^) and use Gronwall’s 
inequality to deduce our claim, namely that 

w{T) < ^e(’'a(T)Tl/2+2„j,^^(T))=T Jpy^{T)y^^{T)T + 2 vp^m(T)T^/'^'^ . 

□ 


Lemma 5.2. LetT £ R+ and assume the finiteness of the numbers dMj. We fix some j £ N throughout 
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this lemma and define for t € [0, T] 


Y,{t) := + Yfit) + Yfit) + Y^it) + Y^{t) 

t 


:= (X,(0)-E[X,(0)])+^ / aC(s)(Xfe(s)-E[Xfc(s)])ds 

OO .t 

+ 12 {'^?kis)Xk{s-)+crJf,{s)E[Xk{s)])dLj{s) 

OO OO 

+ 12 f?k{s){bk{s)-E[bk{s)])ds + '^ pfk{s)dMk{s). 
Jo Jo 


Furthermore, introduce the integrals 


loi^W) ■= x{t), Ii[x]{t):= afJs)Ii_^[x]{s)ds, n € N, 


(5.1) 


where x: [0, T] —>■ R is a measurable function such that the integrals in (15.11) exist for t £ [0, T], Then 

OO 5 OO 


X,(i) - E[X, {t)] =Y,Ii[Ym = 1212 [Yl]it), t£[0,T], 


(5.2) 


n—0 


i—1 n=0 
-2 


where the sums converge with respect to the maximal L^-norm X i—>■ ||X*(T)|| 2 ^ 2 . 

Proof. We deduce from (12.41) that 

X,it) - E[X,it)] 

= (X,(0) - E[X,(0)]) + ^ / a%is)iXkis) - E[Xfc(s)]) ds 

pt OO pt 

+ / + ^ / (4(s)Xfe(s-)+aP(s)E[Xfe(s)])di,(s) 

Jo k=P^ 

OO pt OO pt 

+ 12 f?k{s){bk{s)-E[bk{s)])ds + '^ pfk{s)dMk{s) 






= If[X, - E[Xfi]it) + Y,it). 


Iterating this equality n times, we obtain 

71 — 1 


X, (t) - E[4 (t)] KKO + II [4 - E[4]](i)> i e [0, T]. 


(5.3) 




Next, observe that for any cadlag process {X{t))t^R^ with ||X*(T)|| 2^2 < oo we have 

(j[C ('J')Y 

||(7^[x])*(r)||^, < ||x*(r)|U./^[i](r) < ||x*(r)iu / , 

which is summable in v. Thus, recalling from Lemma lS.ll that both Yj and Xj — E[Xj\ have finite maximal 
T^-norm, we can let n —>■ oo in (15.31) and get 

OO 

4 (<) - E[4 (t)] =Y,Ii K](i)> i e [0, T ], 


y=0 


which is the first assertion. The second part of formula (15.21) holds by linearity. 


□ 
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Proof of Theorem 13.11 The existence and uniqueness of solutions to (12.21) and (12.41) follow from the 


general theory of SDEs, see 32|, Theorem V.7. Since the numbers (13.11) are finite, there are no difficulties 


in dealing with infinite-dimensional systems as in our case. 

It follows from (12.21) and (|2.4I) that the difference between X and X satisfies the SDE 


d(X(t) - X(t)) = (a(t)(X(t) - X(t)) + a^(t)(X(t) - E[X(t)])) dt 

+ {a{t){X{t-) - Xit-)) + a^mXit-) - E[X(t)])) .dL{t) 
+ f^{t){b{t) — E[&(t)]) dt -I- p^{t) dM{t), t € IR+, 

X(0) -X(0) = 0. 


Thus, denoting the left-hand side of (13.31) by A(T), we obtain from the Burkholder-Davis-Gundy inequal¬ 
ity and Jensen’s inequality that 


A{T)<Va{T) [ A(t)dt + 2u<,(rK ( [ (A(t))2dt 
Jo \Jo J 


1/2 


PT sup 
te[o,T] 


a^{t){X{t)-E[Xm\\L2 di 

f{t){b{t)-E[bm 




{a^{X-E[X]).L)*{T) 
{p^-MY(T) 




lL2 


/ T \ 4 

< (ri/\„(r) + 2vYT)vl) f (A(t))2 dt j + ^ A‘(T), 


(5.4) 


where A‘(r) stands for the last four summands in the line before. So Gronwall’s inequality produces the 
bound 

4 

A(T) < K{T) (5.5) 

where K{T) = •\/2exp((T^/^Ua(T) + 2va{T)vLYT). We now consider each A''(T) separately. 

For i = 3 we simply have 


A^(T)<T sup sup X! =TrYT). 

tG[o,T] *eN j 

For i = 4 another application of the Burkholder-Davis-Gundy inequality yields 

1/2 


X^{T) < 2sup ^ E 


zGN 




P*,(0P*fe(0d[Mj,Mfc](t) 


1/2 


<2T'^/^sup sup XI =2T^/‘^re{T). 

«6NtG[0.T] I ' 


(5.6) 


(5.7) 


For t = 1, we use Lemma 15.21 including the notations introduced there and the fact that for all 
stochastic processes {X{t))t^K+ and {Y{t))tes.+ with cadlag sample paths we have 


sup \E[li[X]is)l'Y[Y]{r)] 

r,se[0,T] 


< 


{AfYT)r {A<^YT)) 

n! to! 


sup |E[Jf (s)T(r)] I 

r,se[0,T] 


(5.8) 
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for any j, k € N and m, n G N U {0}. In this way we obtain 


A^(T) = 


\a^mX{t)-E[X{t)])\\^, dt 


= sup / 
iGN Jo 


J2aimx,it)-E[x,m 

i=i 


5 .T 

< ^ sup / 


L^l 

5 


/O 




j—l n=0 

7 ^ / oo oo 


dt 


L2 


L^l 


/o 


= T.^'^e L I E E I dt 

n,m—0 


1/2 




pX / cx) 


\ 1/2 


< e 


|A^(T)|aEsnp / E sup |E[l^‘(s)n‘M]| dt 


t=l 

5 




e[o,t] 




=. el^CTild 


E^‘(^)- 




Using Lemma [STTl the five terms in f|5.9p can be estimated as follows: 


( oo 

E Al{T)Al{T)\Coy[X,{0),X, 

j,k=i 

oT oo 


1/2 


= Tri{T), 


i? 2 (T)<sup / E4(^) llE(«)IU^dt 

zeN JO se[oj] 

AC(s)||Xfe(s)-E[Xfe(s)]lU2ds| dt 


J = 1 
pT oo 



<sup/ E^d(o 

iSN Jo ^ 

r^2 rp 2 

< y(r)supEE4^?fe = 


/ oo 

i? 3 (T) < sup / E sup e[( + a^E[X]) ■ L,) (s) 

iefiJo \ sG[o.i] LV 7 / 


\j,k=l 


X ((aCx + aPE[X])^,.Lfe)(s) 


1/2 


dt 


<sup r ( E 4(^)^rfe(OCov[L,(l),Lfe(l)] 

Vj.fe=i 


dt 


L2 


X / E 


(aC(s)X(s) + uP(s)E[X(s)])^. (aC(s)X(s) + i 7 P(s)E[X(s)]), 


\ 1/2 

ds 1 dt 


< -T^/\AT)V{T)r3{T), 


(5.9) 


7^ i oo ^ 

i? 4 (T) < sup / E [ [ 

iGN Jo \ ^ Jo Jo 


E fFi(^)fSm{r)Cov[bi{s),bm{r)] 


l,m—l 


1/2 


T- 


- ^s'^p E s^p 


v/.fc=i 


s,tG[0.T] 


E /ji(s)/LWCov[6i(s),6™(t)] 


L?71=1 


dr ds 
1/2 


dt 
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/ji2 


pi I 

RsiT) < sup / ^ AP (T)AP (T) sup 

ieN Jo \j,k=i 

< sup r I f; 4(r)JlP (T) [ 
*eN Jo \ Jo 


u'.fc=i 


^ E[(pC.Mi)(s)(pL-M^)(s)] 

l,m—l 

1/2 

IE [p®;(s)Ci™(s)pL(s)] 


1/2 


di 


Lm=l 


ds dt 


< -T3/2rn(T). 

The last step in the proof is the estimation of A^(r). To this end, we make use of the Burkholder- 
Davis-Gundy inequality another time and get 


A (T) < sup 

ieN 


{{a^{X-E[X]))^-L,)\T) 


L2 


< 2vl sup / 
iSN \ Jo 


E 


\ 1/2 

(aP(t)(A(t)-E[A(t)]))^] dtj , 


The further procedure is analogous to what we have done for A^(r): instead of we have here. We 
leave the details to the reader and only state the result, which is 

A2(T) < ^ K,{T)r,iT). 

te{2,4,8,10,12} 


This completes the proof of Theorem 13.1 


□ 


Our next goal is to prove Lemma 13.81 concerning the rate of growth of the maximal degree in the 
preferential attachment random graph as described in Section 13.31 For the undirected version as in Q 
the corresponding result goes back to [i^. Indeed, the proof there basically works for our case as well, 
but there are some steps that require different arguments. Thus, we decided to include the proof to our 
lemma. 


Proof of Lemma 13.81 The statement is evidently true for M'" when a + /? = 0 (resp. for when 
/3 + 7 = 0). In fact, for this extremal case, in every step of the random graph a new edge is created 
pointing to (resp. from) a new node. This means that M™(A) (resp. remains constant for all 

N € No, and the claim follows with c'” = 1 (resp. c°'^* = 1) identically. In the other cases, we closely 
follow the proof of Theorem 3.1 in [2^. In addition to the notation introduced in Section 1^751 we further 
define for N G Ng and o G {in, out}: 


S^{N) 

3 

c^{ 0 ,k) 

Z%N,j,k) 

Q{N) 


u + N + 5^n[N), 
d^w(J) + <5^ jeN, 
inf{AGNo:d^(jv)(j) J^O}, j G N, 


^^{o—in} “f /? “t“ 7^{o—out}? 

I, c^(A + l,fc) :=c®(A,fc) 


S^{N) 


+ N, k) 


S'o(A) +sOfc’ 
AO(a; + A, j) +/c - 1 
k 


k G 






a{e^+i : / = 1 ,..., A), ^(oo) := cr I (J g{N) j . 


\.N=0 


k G K+, 


Obviously, I/(A) is the cr-held of all information up to step A in the preferential attachment algorithm, and 
A* is a stopping time relative to the filtration (t/(A))ArgN for every j G N. Analogously to Theorem 2.1 
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of [ 2 ^ one can now show that for all k S R+ and j € N the sequence j,k))Neno i® ^ positive 

martingale relative to the filtration {G{Nj + N))Neno- ^ consequence, Doob’s martingale convergence 
theorem implies that 

Z%N, 3 ,k)^C{j.k) a.s. (5.10) 

for some random variables C'{j^ k). The convergence in (15.101) also holds in for all p € [1, 00 ) because 
we have 

Z%N, 3 ,kY <C{k,p)Z%N,j,kp) a.s. (5.11) 

for some deterministic constants C{k,p) G K+ independent of TV and 3 . Indeed, on {TV® < 00 } we have 
by definition 

z®(TV,j, kY _ c®(A^; + TV, kY + tv, j) + /c - iV /x®(tv; + tv, j) + fcp -1 

Z^{N,j,kp) c®(TV® + TV,/cp) V k / \ kp 

where 

c®(TV,fc)P c^{N-l,kY S^{NY~^{S^{N) + s^kp) ^{N-l,kY c®( 0 ,fc)P 

d^{N,kp) c®(TV—l,fcp) (S'®(TV) + s®/c)P “ d^{N — \,kp) ~ ~ c®(0,fcp) 

a; +/c - iV/a; + A:p - l\ _ r(A:p + 1) r{k + xY x^ooT{kp+l) 
k / \ kp ) r(A: + 1)P r(a;)P“^r(fcp + a;) r(fc+l)P’ 

which shows (I5.11|) . Next, define for TV G No and j GN 



m%N, 3 ) := max{Z®(TV - TV®, *, 1): * = 1,..., 3 , TV® < TV}, 
p®(j) := max{C®(T,l): T = 1 ,...,}}, 


m®(TV) :=m^(TV,n(TV)), 

/a® :=sup{p®(j): jGN}, 


such that in particular the relationship m®(TV) = c®(TV, 1)(TVT’®(TV) + 5®) holds. It is not hard to see that 
(m®(TV)) 7 vgNp, as the maximum of martingale expressions, is a submartingale relative to (t/(TV))7V6No- By 
definition the sequence (c®(TV, k))Ne.no decreases to 0 as TV —>• 00 ; more precisely, we have 


c^iV, *.) = c-(iV - I, , < c-(iV - I, *,)- » 


S'®(Af - 1) + s®fc 


^ + TV — 1 + (5®(u(0) + TV — 1) + s®fc 


< 


N-l 

n 


(1 + (5®)j + (5®n(0) + v ^ 1+50 ) 


(1 + (5®)j + (5®n(0) + n + s^k p _|_ <5<>n(o)+^t/+sofc ^ 


TV i+^, TV 


As a consequence, when p is large enough, 


n{N) 

E ^"(TV-TV®,*,1)^> 

< C(l,p)E 

'n(N) 

E ZYN-Nf,i,p) 





< C{l,p)pE[Z^0,z,p)] < C(l,p)(^^(°^ ^E[c®(TV®,p)] 

(^n(0) + gE[c®(a,p)]^ <00 


(5.12) 


independently of TV. This implies that the submartingale to® converges a.s. and in for all p G [1, 00 ). 
It follows from (|5.12l) that for j > n{Q) we have 


E[(to^(TV)-to®(TV,j))p] <E 

<C(l,p) 


n{N) 

Y, ZYN-Nr,z,lY 

i=j+l 


n(0) + p + 5® — 1 
P 


E ]E[c®(T,p)]. 

i=j — n{0)-\-l 


(5.13) 
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Letting iV —>• oo, the left-hand side of (15.131) converges to 

E[f lim c^(Af, l)M^(iV) - 

. VAT—>oo 

while the right-hand side is independent of IV. Now taking the limit j —>■ oo and again assuming that p 
is large, we obtain the desired result (13.131) . Note at this point that /r* is indeed an a.s. finite random 
variable that belongs to for all p € [1, oo), which is proved using a similar argument as in (I5.12p . 

It remains to prove (13.141) . To this end, observe that by the law of large numbers we have {n{N) — 
n{0))/N —>■ a -I- 7 a.s. In other words, there exists for every e £ (0, a -|- 7 ) a possibly random N G N such 
that for all TV > iV we have 

n{N) — n(0) 


N 


- (a + 7 ) 


< e, 


or, equivalently, n{N) G [n(0) + (a + 7 —e)A^,n(0) + (a + 7 + e)A^]. Consequently, for all A: e N and N > N 

N-l N-l N-l 


>(iv,fc)=n 


SOi^) 


N-l 

n 

2^0 
N-l 

xR 

2=0 


Soi^) 


> 


n 


^O(i) 


n 


v + i + S^(n( 0 ) + (a + j — e)i) 


S^{i) + s^k v + i + S^(n(0) + (a + "/ — e)i) + s^k 




N-l 

n 


1 ^ + i + S^{n{0) -I- (a -I- 7 — e)i) + s^k 


^k v + i + (5*(n(0) -I- (a -|- 7 — e)i) 


So(i) + s 

1 ^ + 1 + i5*(n(0) -I- (a -I- 7 — e)i) 


1 / + i + S^(n(0) + (a + j — e)i) + s^k 


= c^{N,k) 


\N,k) 


f AT \ fc \ ■p / A/' I 

^ l+50(Q,+..y_e) J ^ i+So(^a+'y-e) 

y_\ p / AT I <S°ri.(0)+iy+aOfc ^ 

— e) J H-5®(a-|-7—e) J 


TiN + 


(5Ort(0)-|-: 
1+5‘^(q;+7 — 


( 

T' f AT I 5'^n(0) + ^'+s'^ fc 

iy_^±!!if±Zz!lZ ;v-T 


r(TV + 


(5<^n(0) + i/ 


gOfc 

jV” l + 6^(a + T-e) ^ 


N ■ 


1+(5^(q:+ 7 —e) ^ 

So choosing c*(iV) := c*(iV, I) for TV £ No fulfills (13.141) . 


□ 


Finally, we turn to the proof of the large deviation result in Section 01 
Proof of Theorem 14.11 By definition, satisfies the equation 

X^{t)-X^{t)= [ a^{X^ - X^){s)ds+ [ a^'^{X^[s)-¥.[X^{s)])ds + t£[0,r], 
JQ Jo 


whose solution is 


X^{t)-X^^t) = I e' 
Jo 


a^{t-s}^N,P(YN 


a^^’^iX^{s)-E[X^{s)])ds+ I e' 

Jo 


a^{t-s) N,P 


p^'‘’^dM{s), t£[0,T]. 


In order to establish a large deviation principle, it suffices by Theorem 4.6.1 of |I5l| to prove such a 
principle in Ji) for the first d coordinates of the process for every d £ N, that is, for the 

Hy-valued process 

yy(t) + + 

ft 


E 




^0 -^0 

nt 00 


■'IJ 


/o 


E er/'-'a"'- 


00 


E 

3,k=l • 


L'fc 


(t-s) N,P 


E (r) - E[x/'' (r)]) dr ds 

l^k 


Pjk dMfc(s), 


i = 1 ,... ,d, t £ [ 0 ,T] 


(5.14) 
































Partial mean Held limits in heterogeneous networks 


23 


where we have used the formula 

i-t 


Xf(t)-E[Xf(t)]= / e' 
10 


n J n 

Jo Jo 




valid for alH G N and t G R+. Actually, we will even prove the large deviation principle in U), 

which is stronger. To this end, we introduce the notation 

[7WT]-1 

x{t) := X +a; ') t & [0,T], x € D^. 

k—1 

Then by Theorem 4.2.13 of [l5| and Lemma l5.3l below we can equally well show a large deviation principle 
for = y _|_ y.^v ,2 _|_ yAr,3 same principle will then hold for . But this is proved in Lemma ISTSl 
That the rate function for is convex with unique minimum 0 at 0 and can only be finite for 

functions in AC^ ^ is inherited from the rate function of Y^. □ 

Lemma 5.3. For each d G N and c = 1,2,3, the D^-valued processes Y^’‘‘ and Y^’‘‘ are exponentially 
equivalent, that is, for all e G (0,1) we have 


1 


lim , , 

N^oo J[N) 


logP 


sup sup \YYit)-Y:^’\t)\> 


N,ij 


te[0,T] 

Proof. We start with t = 1. Writing i = [7(7V)t]/7(A^) and diag(a) := a — , we obtain 

YYit)-YYit) 


sup sup 

tG[0,T] 


< sup 

t6[0,T] 




-a s^N,P^dmg{a^’'^)(s-r)f^N,C^X (Y^ 


(a^’^)^(A'''(r) -E[A'''(r)])dr ds 


0 JO 


+ sup 

te[o,T] 


e-a''s^Ar.Pediag(a'^''^)(s-r)(^Ar.C)X(^iV(^) _]£j^7V(^)])d^ ds 


t Jo 


(5.15) 


We can proceed with these two terms separately. Since |e“~* — e“~*|oo < Wae’'“^/ 7 (A^), we have for the 
first term in (j5.15|) 


sup 

te[o.T] 


sup |a^’P(a^’C)>^(X^(t)-E[X^(t)])| > 


< 


< 


) -E[A^(r)])drds 

n{N) 


> e 


ie[0.T] 






e-f{N) 


We note that := X^ — E[A^] satisfies the integral equation 


=:p(iV). 


Hence we have 


^^(i)= f a^’^^^{s)ds + p^'^M{t), tGM+. 

^0 

{^^nt)< f \a^’^\{er{s)ds+{pYCMnt), tGM+, 
Jo 
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or after n G N iterations, 


nl ml 

m—O 

„»).,), f. _ g 


m—0 


where the last line holds when n is large enough such that {vaT)'^/nl < 1. It is not difficult to recognize 
that the exact value of n only affects some constants in the subsequent arguments with no impact on the 
final result; we therefore assume without loss of generality that n = 1 (i.e. VaT < 1). Then 


p{N) < P 


l{N) 




i=i fc/j 1=1 




< 


7(Ar) 


I Af.C,7 ^ e7(-^)(l ~ ■'^aT) 


l{N) ZGN tG[0,T] 


Let X{N) be positive numbers to be chosen later. Using the independence of the Levy processes Mi and 
Doob’s maximal inequality, we arrive at 


p{N) < exp 

< exp ( — 


eX{N)^{N){l-v,T) 

qi{N)va{e-^^YT^ 

eX{N)^{N){l-VaT) 


exp 


tW 

n® 

m—1 
7(^) / 

n 


\7(^) /eN 


m—1 


exp 


(g7Ssup|p^®|M^(T) 

zeN 


< exp ^- 


eX{N)^iN){l - VaT) 
qi{N)va{e'’-^)^T^ 


exp I T^{N)'^q 


X{N) 


l{N) i, 


sup Ip, 




iV,C| 
Im I 


Now define 


A(fV):=7(iV)^o'(l)/ sup |p;^^| 

\ l,m£N 


N en. 


Since dzo is a convex function, its inverse 4 'q ^ is concave and therefore we have for large N that X{N) > 




that 


g( 7 (iV))/ ^sup; „jgisj , which increases to infinity with N. With this choice of A(iV) it follows 


lim 


1 


■logp(iV) = -oo, 


N^oc 7(iV) 

which completes the proof for the first term in (j5.15|) . The second term can be treated in analogous way: 
now the factor 7 (iV) does not come from the difference |e“ ‘ — e“ ‘ |ooj but from the domain of integration 
The details are left to the reader. 

For 6 = 2 similar methods apply. Also here we do not give the details. Instead, we sketch the proof 
for 6 = 3 where some modifications are necessary. Recalling the meaning of h'(N) from (A2), we have 

ct 


sup sup 
ie[0,T] i=l,....d 




A6,3/ 


(e“ ‘-e“ g 


< sup sup 

6G[0,T] i=l,...,d 

+ sup sup 
te[0,T] 


< |e“ * — e“ ‘loo sup sup 


V ’ dA/(s) 


dM(s) 


+ e “ sup sup 

tG[0,T] i=l,...,r(6V) 


tg[0.T] i=l....,r(Ar) 

t 

-a'^s N,P 


p‘^’PdM(s) 

dM(s) 


't 


(5.16) 




































Partial mean Held limits in heterogeneous networks 


25 


We can again consider these two terms separately. For the first one we have 

rt 


|e“ * — e“ *|oo sup sup 

te[o,T] i=i,...,r(Ar) 

•t 


e-“"V^>PdM(s) 


< r(iv) sup ] 

i=l,....r(Ar) 


sup 

tG[0,T] 


'0 


i=l,....r(Ar) 


'0 

dM(s) 

-r{N) 


> e 


> 


ej{N) 




<r(iv) _s.p 


7(A^) 


.r,v) sup »p(-d(^)n. 


exp I A(iV) sup 
ieN 

A(iV) 


Jo 

Jo X_1 


exp 


7(iV) 


sup 


'0 


Now recall that the stochastic integral in the last line has an infinitely divisible distribution. Moreover, 
the larger the integrand, the larger the exponential moment is. Since the integrand above is uniformly 
bounded in i and k by our hypotheses, the stochastic integral above can be replaced by some constant 
times Mk{T) for the further estimation. Therefore, the remaining calculation can be completed as in the 
case t = 1. For the second term in (15.1611 the reasoning is the same, except that the factor 7 (iV) is now 
due to the domain [t, t] of the stochastic integral. Observe at this point that Mkit) — Mk{t) has the same 
distribution as Mk{t — t) and that |t — t| < 1/7(A^). Again, we do not carry out the details. □ 

Lemma 5.4. For each l = 1,2,3 the processes € N) form an exponentially tight sequence 

in {D^,'D^,U), that is, for every L G K.+ there exists a compact subset of (with respect to the 
uniform topology U) such that 

^logP[y^’‘^ iF^]<-L. 

Proof. We first consider i = 1. We will adapt the idea of Lemma 4.1 in [l^ to our setting. As shown 
in part (I) of the proof there, it suffices to show that for every a,e G (0,oo) there exist a compact set 
H C Hy, some C G (0, oo) and n € N such that for all A > n 


]P[d(Y^’\H) > e] < 


(5.17) 


where d{f,H) := inf{supjg[Q-j.] supj^;i^ \ — gi{t)\ ■ 9 £ H} ^ ^t- order to prove (I5.17|) . we 

first define for n G N and A C 

[7(")T]-1 

ddn (A) . ^ f G Drp . f ^ [ K «+i "j T ^[7(n)T] fl [ ^ 7 ^]; ; • ■ • ; ^[7(n)T] ^ A 

^’ 7(ti) I l7('>)’ J 


K — 1 


It follows from Equation (4.3) of 12 1 that for A > n, A C and / G Hn{A) we have 


d{f,Hn{A)) < sup sup sup 

K=0,...,[7(n)T]-l;^g[l :iW_|_ 7 ^ i=l,...,d 


h 


r7(^f)Kl I \ 

7(A) 


AT-/. 


r77V>i 

I- 7(n) J 


Next, define K := [—1, l]'^. Then for every /3 G (0, oo) and N > n we have 

F[d{Y^’\HniPK)) > e] < ^ Hn{(3K)]+F[Y^’^ G Hn^K), d{Y^’\H4(3K)) > e] 


(5.18) 


(5.19) 
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The first probability is bounded as follows: 


^ ffN(/3K)] =P 


sup sup 

i=l,...,dK=l,...,[j(N)T] Jo 


x^a^'C(X/^(r)-E[X,^(r)])drds 

l^k 


/O 


E 


'(fc/7W-s)^JV,Pga"^C(s-r) 

jk 


j,k^l 

>P 


< 


sup sup Y (0 “ (t)])| > ^ 


... =-.p'{N). 

ie[0,T] 

By the same arguments as in Lemma l5.31 one obtains (again assuming VaT < 1 without loss of generality) 
p'{N) < 2^-^ exp sup ^ 


liN) i, 


m^N 


We choose X{N) := "f{N) this time. Then we can make log(p'(iV))/ 7 (fV) arbitrarily small uniformly for 
large N by varying the value of (3. 

For the second step of the proof of (15.171) we conclude from (I5.18|) that 


P[y^P e HNm),d{Y^’\H^iPK)) > e] 


< 


sup sup sup 

=0,....[7(n)T]-l Ae[l,^(^ + l) i=i,-,d 


y 


N,1 


l{n) 


] + A 


7(fV) 


A T - y 


N,1 


l{n) 

l{N) 


> e 


(5.20) 


For the further procedure, we split the difference in the last line into two terms in the same way as in 
(I5.15|) . We only treat the corresponding first term. As before, the other one can be estimated similarly. 
Introducing the notation (resp. t^’^) for the time point in the first (resp. second) parenthesis of 

(I5.20p . and observing that 0 < t^’’^ — < 2/^{N) + 1 / 7 ( 71 ), we obtain 


sup sup sup 


^N + N,ti N.N,r 

e“ _e“ 


[ [ e-“'^"a^’Pe'^*’^s(“'^’'')(*-’’)(a^’^)^(A^(r)-E[A^(r)])drds 
Jo Jo I 


> e 


< 


sup sup 

te[0,T] i=l....,d 


(t) -E[Af (t)]) 

j=i k^j 


> 


;a(e’'“^)3r2 ( 


2 I 1 ) 

7 (Tvy 7(n) J . 


< 2^Wexp - 


eA(iV)(l-u„T) 


( \ I I 


exp fT7(iV)vI'o sup 


7 (iV) 


Lm£N 


-^Im 


where the last line follows in similar fashion as before. With \{N) := 'y{N) we can make, by taking n 
large enough, the logarithm of the last term divided by ^{N) arbitrarily small for N > n. This finishes 
the proof for t = 1. The case t = 2 is analogous, while for t = 3 the line of argument remains the same 
in principle, with slight changes to account for the discretization of Levy processes, cf. the proofs of 
Lemma 15751 and Lemma 4.1 of [13 ■ n 


Lemma 5.5. The process iXi '■ i = l,...,d) satisfies a large deviation principle in (iA^,(Dy,t/) with a 
good convex rate function Id - [0,oo] such that Id{x) < 00 implies x € ACtj^. Moreover, we have 

Id(fi) =0 and this minimum is unique. 


Proof. We apply the abstract Gartner-Ellis theorem of 12|, Theorems 2.1 and 2.4, to y^ and prove 
the following steps. 
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(1) The laws of V", N € N, are exponentially tight in [/). 

(2) For all 0 G the limit A(0) = limAr_,.oo( 7 (A^))“^Ajv(7(iV)0) exists, where 


A^(0) :=logIE 


/ d T ^ ^ 

exp f ^ Y^{t)e,{dt) 


(3) The mapping A is C^-Gateaux differentiable, in the sense that for all 9 G there exists G 


such that for all 77 G 


e ->0 e 


= J2f 

i=l Jo 


(5.21) 


Part of the claim is that the limit in (15.211) exists. Moreover, we have A(0; rj) = 0 for all rj G M7^. 
(4) We have {x G : A* (a;) < 00 } C ACi^, where 


A*(a;) := sup f a;i(t) 0i(dt) — A(0) ] , x e D^. 

e&M^ \i=i Jo J 


(5) For every a G M+ the set {x G D^: A*{x) < a} is compact in {D^,'D^,U). 

Part of the Gartner-Ellis theorem is that the rate function Id is given by A*, the convex conjugate or 
Fenchel-Legendre transform of A. Since A is a convex function in 9 satisfying (3), the conjugate A** 
of A* is again A, see Theorem 12 of 33|. Thus, by the first corollary to Theorem 1 in [2|, we have 
Jd{^) = A*(0) = 0 and this minimum is unique. 

Let us now prove (l)-(5) above. Part (1) has been proved in LemmaFor (2) we first compute A. 
For all 9 G we have (recall that i := [ 7 (A)t]/ 7 (A)]) 


Aw(7(iV)0) =logE 


d „T / pi ps 


[ if [ aW Z! ’ ^"”’'VZdM^(r)ds 

\ .-I ^0 V ^0 ^0 . K ,_n 


\ JO JO 

00 pi 




+ 7(fV) ^ 
j,k=l 

l(N) 

= ^ logE 


'0 


%■ "V^’^dMfc(s) ] 9i{dt) 


/ d pT / pt ps 00 

exp [ ^ 


. 2=1 


/o \ Jo Jo 


E A(^)e 

j,k,l=l 

7(^)E®b dM^(s)'j 9,{dt) 


(t-s) Ar,p a^’^(s-r) N,C 




i=i 


Pzm dM^(r)ds 

(5.22) 


by the independence of the processes Mm- By a stochastic Fubini argument (see Theorem IV.65 of [H), 
the term within the exponential in the previous line can also be written as (s denotes the smallest multiple 
of 7 (V) that is larger or equal to s) 

Jg ij^ J- + - r) 9^{dt) \ Mm{dr), (5.23) 

and has an infinitely divisible distribution such that its logarithmic Laplace exponent in (I5.22|l is explicitly 
known. Denoting the parenthesis in (15.231) by H^{9, r), it is given by 'd!miH^{9, r)) dr. We claim that 
this term converges uniformly in m to 'd/miHmi9,r)) dr. Indeed, by the dominating property of Mq, 
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the claim follows as soon as we can prove that H^{9, r) —>■ r) as N ^ oo, uniformly in m £ N and 

r £ [OiT]. This in turn follows from 


H^{e,r) - HZiO,r)\ 

T pT 


< 


[ f - s,s - r)ei{dt)ds + [ f ^ Gjmjt - s,s-r) 6>^(dt) ds 

Jf Js Jr Js 

f / s,s-r) - G^^{t-s,s-r))e,{dt)ds 

Jr Js 

[ [ J2^G^^{t-s,s-r)-Gg,{i-s,s-r))ei{dt)ds 

Jr Js 

pf- d p'Y d 

/ '^Rini{t-r)Oi{dt) + / - r) - Rf^{t - r)) Oi{dt) 

pT d 

-r)- R^^d-r)) e,{dt) 


2=1 


< d sup sup |G'im(t, s)| [ 7 (iV) ^ sup |0i|([O,r])+ sup [ |6>i|([s, s)) ds ) 
2 ,m£N s,tG[ 0 ,T] \ 2 = 1 ,...,d i=l,...,d Jr J 


+ d sup |6>i|([0,T]) T sup sup s) - G^(t, s)| + 


2=1,...,d 


2,meN s,tG[0,T] 


sup sup |G,^(t,s)| 
) N,i,mGNs,t€[Q,T] 


Va 


+ 7 (iV) ^ sup sup + sup sup \Rij{t) - Rfj{t)\ d- ““ sup sup \Rf'j{t)\ 


*jeN te[o.T] 


ijeN te[o.T] 


77''1 N,i,jeNte[o,T] 


where all terms converge to 0 by hypothesis independently of m and r. For the second summand one has 
to notice that the integral term equals Jf Ids |di|(dt) and thus converges to 0 uniformly in i and r 
with rate l/ 7 (iV). Since the value of Cesaro sums remains unchanged under uniform approximations, it 
follows from assumption (A8) of Theorem 14.II that 


m 



1 


1™ /ATX 

N^oo 7(fV) 



T 


d/miHjnid, r)) dr. 


Next, we prove the G^-Gateaux differentiability of A. First, regarding the existence of SA{0; rf) in 
(15.21^ . we note that the mappings —>• L°°{N x [0, T]), 9 H> {Hm{9, r): m £ N, r £ [0, T]), are continu¬ 

ous linear operators and therefore Frechet differentiable, which is stronger than Gateaux differentiability. 
Together with the fact that 'I'm is differentiable with locally bounded derivative, and Cm < cq and Vm < vq 
for all m £ N, this implies that for every 9 and r] 

e“^ [ {d>m{Hm{9 + er],r)) - dijniHmidyr))) dr 

Jo 

converges uniformly in m £ N as e —> 0. This in turn proves the Gateaux differentiability of A. Moreover, 
it enables us to compute the derivative explicitly. Using the chain rule for Frechet derivatives, we obtain 


llN) rp 


SA{9:rj) = lim / lim 

7V->oo ^ 


. dJmiHmiO + er],r)) - dJmiHmd^r)) 


m=l ' 


e^O 


dr 


liN) „T 


lim ^ [ d)'^{H^{9,r))Hmiv,'r)<^r 

liN) rp y p, ^ 

lim V' / df'^{H^{9,r)) i / / Gim{t - s, s - r) T]i{dt) ds + 

\Jr Js ^ 


[ Rimd - r) r]i{dtU dr 

i=i / 
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^ / ( / / '^'Tn{Hm{d,r))Gim{t - s,s - r)drds + / Rim{t-r)dr\ r]^{dt) 

^ pt li^) / ps \ 

= X! / / X] ( / Gi^(^-S:S-’')^m(-f^m(6',?'))d?’ + i?™(t-s)«'^(iJ^(6»,s)) j dsr 7 ,(di), 

.^1 Jo Jo '^"'0 ^ 


where all interchanges of integration, summation and taking limits are justified by dominated convergence. 
From the last line we deduce the existence of G satisfying (I5.21F Since Hm{0, r) = 0 and = 0, 

we have 6A{0; rj) = 0 identically. 

Next, we demonstrate (4), namely that A* only assumes finite values on the set Adj^, that is, A*{x) < 
oo implies that for every e G (0, oo) there exists <5 G (0, oo) such that ^ when¬ 

ever n € N, 0 < oi < 6i < ... < a„ < &„ < T and J2'j=i{bj — aj) < S. In order to do so, we follow the strat¬ 
egy of proof in [l2|. Theorem 3.1. We consider 0i := i^bj — where G is arbitrary. Then 

we evidently have 0^{{r,T]) = I]”=i )(’’)• Denoting Ct ■= Tsup^ sup, |Gim(t, s)| + 

suPtG[o.T] suPi_„,gN \Rim{t)\, it follows that 


T T / ^ \ 

A(6>) < sup [ 'iim{Hm{0,r)dr < [ 4'o ( Gt ^ 6>i((r, T]) ) dr 
meNJo JO y J 


rT ^ 


= I X^o 1 ‘^'rX^n lK.b,)Wdr ^^jup^^-o ( Gt^IC^I J - Gj) 


i=i 


i=i / j=i 


=:G(T,|| 5 i||i,...,||nii)E(^^--«^)’ 

i=i 

where ||C'’ ||i := ICil- a consequence, we deduce from the definition of A* that for all r G (0, oo) 
and < T 

d n n 

- Xi{aj)) < C(T,t,. .. ,T)'^{bj - Gj) + A*{x). 

i=l j=l j=l 

Taking as the r times the sign of Xiibj) — Xi{aj), it follows that 


d n n 

X X < t-^C(T, t, ..., t) '^{bj - Gj) + t-^A*{x). (5.24) 

i=l j=l j=l 

If A*(a::) < oo, we can now choose r first and then 6 to make the left-hand side arbitrarily small. 

It only remains to prove (5), the compactness of the level sets of A*. By step (4) and the lower 
semicontinuity of A*, its level sets are closed subsets of AG|.. Thus, the Arzela-Ascoli theorem provides 
a compactness criterion. First, observe that for all t G [0,T] we have for x G AC^ with A*{x) < a that 

d d pT 

|a;i(t)| = sup / a:i(t) 0i(dt) < a-I- sup A(0) < oo, 

^ ^ Jo eeSt 

where 0t is the finite collection of all 6 for which each coordinate is either St or —St- Second, for the proof 
of the uniform equicontinuity of the functions x G AC^ with A*{x) < a, we recall from (15.241) that 

d 

y^ \xi{t) - a;i(s)| < T“^G(r, r,... ,r)(t - s) -I- r'^a, 
which converges to 0 independently of x when s '[ t and r —>■ oo. □ 
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